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Abstract 

Learning-based model predictive control (LBMPC) is a technique that provides 
deterministic guarantees on robustness, while statistical identification tools are used to 
identify richer models of the system in order to improve performance. This technical 
note provides proofs that elucidate the reasons for our choice of measurement model, 
as well as giving proofs concerning the stochastic convergence of LBMPC. The first 
part of this note discusses simultaneous state estimation and statistical identification 
(or learning) of unmodeled dynamics, for dynamical systems that can be described by 
ordinary differential equations (ODE's). The second part provides proofs concerning 
the epi-convergence of different statistical estimators that can be used with the learning- 
based model predictive control (LBMPC) technique. In particular, we prove results on 
the statistical properties of a nonparametric estimator that we have designed to have 
the correct deterministic and stochastic properties for numerical implementation when 
used in conjunction with LBMPC. 



1 Introduction 

This technical note is meant to be understood in the context of [3J, and it consists of two 
distinct parts. Sections [2] and [3] concern simultaneous state estimation and statistical identi- 
fication (or learning) of unmodeled dynamics, for dynamical systems that can be described 
by ordinary differential equations (ODE's). The second part is found in Section [4] and pro- 
vides proofs concerning the epi-convergence of different statistical estimators that can be 
used with the learning-based model predictive control (LBMPC) technique. 

For the results on estimation and learning, we assume that for state vector x G MP, 
control input u G M m , and output y G M 9 , the system dynamics are given by the following 
ODE: 

x = A c x + B c u + g c (x,u) 
V = Cx, 

where A C ,B C ,C are matrices of appropriate size and g c (x,u) describes the unmodeled (pos- 
sibly nonlinear) dynamics. We will assume that the control inputs generated by the model 
predictive control (MPC) schemes are piecewise constant 

u(t) = u m , W G [mT u , (m + 1)T U ) , (2) 
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where T u is the sampling period of the input. Note that appropriately designed MPC can 
generate other control schemes, such as piecewise linear inputs. 



2 Limitations on Filtering and Learning 

We begin with a negative result about the inability of filters to estimate both the state and 
unmodeled dynamics, for a general system in which all states are not measured. (This result 
does not apply to systems with special structure, such as in [2].) This limitation applies to 
situations in which unmodeled dynamics are described by a series expansion with constant 
terms, and so it is relevant to a wide class of systems and filtering approaches. 

Suppose the unmodeled dynamics are parameterized as g c (x, u) = 7(2;, u; 9) + K, where K 
is a constant, non-zero vector and ^(x, u; 8) is a parametrized function such that 7(2;, u; 8q) = 
for some parameter value 80 ■ We again note that this includes the situation in which g c is 
given by a series expansion (e.g., Taylor polynomial, Fourier series, etc.). 

The intuition is that statistical identification (or learning) of the parameters 8, K and 
estimation of the state x can be cast into the framework of observability of an augmented 
dynamical system. The augmented system has y = Cx and dynamics 



X 




A c x + B c u + j(x, u;8) + K 


k 







8 








(3) 



When all states are not measured and there is no special structure on K, then this augmented 
system is not observable. This means that (x, K, A) cannot be simultaneously estimated using 
measurements of the system output y. This is formalized by the following theorem. 

Theorem 1. A necessary condition for the observability (and detectability) of the system 
given in with y = Cx is that rank(C) = p. 

Proof. Suppose 8 = 8q, which makes 7(0;, u\ 8) = 0. Then the system is linear and time- 
invariant (LTI). Using the Popov-Belevitch-Hautus (PBH) test, the system is observable if 
and only if rank(0) = p + p = 2p, for all s G C : Re(s) > 0, where 



si - A c 


c 



—I 

si 




(4) 



If s — 0, then the matrices and si are both singular, and the block structure of implies 
that rank(0) = p + rank(C). The system is not observable (and not detectable) when 
rank(C) < p, establishing necessity. □ 

Remark. This result also applies to discrete time systems, and the proof is nearly identical. 

In light of this negative result concerning filters, we require that C be full rank. Without 
loss of generality, we assume that the full state x is measured. 
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3 Nonparametric Filtering for Dynamical Systems 



The design of a Kalman filter for systems with unmodeled dynamics can be complex, and 
so we propose a nonparametric regression approach for estimating the state. Available 
approaches include local polynomial regression (LPR) or spline-smoothing; the Savitzky- 
Golay filter [14J is technically a finite impulse response (FIR) filter implementation of LPR. 
We design a new nonparametric filter, and one advantage is of this filter is that it is easily 
computed because it is the weighted sums of measurements. 

An important point to note is that the statistical guarantees provided by our filter are 
not the same as for a Kalman filter. The Kalman filter is defined to be consistent if its 
state estimates are unbiased and the true error covariance is smaller (covariance matrices 
are positive semi-definite, and so a partial order can be defined) than the estimated error 
covariance. In our method, consistency is defined with respect to the sampling period T s of 
state measurements. As T s — > 0, the estimates converge to the real values in probability. 
This philosophical change is necessary in order to use nonparametric statistics, otherwise we 
would be forced to use a parametric model of the unmodeled dynamics. 

We begin with a lemma about the differentiability of the state trajectory x(t) when the 
inputs are piecewise constant. 

Lemma 1. Suppose g c (x,u) is Q — 1-times differentiable. For m e Z, the trajectory x(t) 
which solves the ODE in is once- differentiable everywhere, Q-times differentiable at t ^ 
mT U; and not twice- differentiable at t = mT u . 

Proof. The first time-derivative of x(t) is given by (py), by definition. Because the inputs 
are piecewise constant the input u(t) is not differentiable at t — mT u . Because the first 
time-derivative of x(t) is a function of u(t), this means that x(t) is not twice-differentiable 
at t = mT u . Recall that u(t) is constant for t ^ mT u . Thus, u(t) is smooth at t ^ mT u . 
This implies that x(t) is Q-times differentiable at t ^ mT u , because g c {x, u) is Q — 1-times 
differentiable. □ 

Remark. These qualitative features mean that we cannot use LPR methods with order higher 
than zero (i.e., the Nadaraya- Watson estimator) without modifying the filtering scheme. This 
is an important point, because the differentiability of the trajectory x(t) makes it tempting 
to use LPR. Yet, no theoretical convergence guarantees can currently be made in such a 
situation, and the behavior of these filters may be unpredictable. 

In light of these restrictions, we propose a modified sampling scheme. Recall that T u is 
the sampling time for control inputs, and we define T s to be the sampling time for state 
measurements. We require that kT s = T u for some k 6 Z + , and this scheme is illustrated in 
Fig. [I] for the case of k — 4. The advantage of this sampling scheme is that the trajectory 
x(t) is piecewise smooth (infinitely differentiable) in between the samples taken at mT u , 
because the control input u(t) is piecewise constant. This allows us to use LPR of order 
higher than zero (e.g., local linear regression), which can give significant improvements in 
estimation error over zeroth order LPR. 

If the trajectory of the real system is x(t), then consider a measurement model 





(5) 
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Figure 1: We use a sampling scheme with two sampling periods. The inputs change at 
every T u units of time, and the states are measured every T s units of time. In this example, 
kT s = T u with k = 4. 



where q are independent and identically distributed (i.i.d.) random variables with zero 
mean and bounded values Z M < [ej] M < s M . The notation [ej] M indicates the /i-th component 
of the z-th noise vector. Suppose that we have made measurements for m = 0, . . . , n. This 
measurement model corresponds to the sampling scheme seen in Figure [TJ 



3.1 Filter Design 

Suppose k(v) is a kernel function, which is a bounded even function with finite support. We 
will use A, p to denote left and right differentiability, and r is the polynomial order of the 
filter. Let hx- m ,i, h p - m ^ £ R be bandwidth parameters. Next we define a diagonal matrix 
Rm,i that is used to filter to the right side of the i-th entry of the measurement at t — mT u ; 
its entries are given by 

R mii = di&g{K(0), K(Tjh X;m ,i), ■ ■ ■ , K{kT s /hx ;m ,i)}- (6) 

Similarly, we define a diagonal matrix L m ^ that is used to filter to the left side of the i-th 
entry of the measurement at t — mT u : 

L m>i = di&g{K(kT s /h p . mji ), . . . , K(T s /h P]m>i ), k(0)}. 

Note that the R m , matrix uses the bandwidth h\ m i, and L myi uses bandwidth h 



(7) 
The 

reason is that filtering to the right of a measurement requires left differentiability, while 
filtering to the left of a measurement requires right differentiability. Lastly, we define the 
Vandermonde matrix 

... 
IT,.. 



1 kT. 



s 



k r T r 



We are now ready to design the filter. The filter coefficients are given by 



(9) 



and ei is the unit- vector with a 1 in the first position and zeros everywhere else. The idea is 
that w m>i filters on the left side of t = mT u and v m ^ filters on the right side of t = mT u . As 
time advances to t — nT u , we first filter on the left side of £(nT u ) (because there is no right 
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side). At the next point in time t = (n + l)T u we filter on both sides of £(nT u ). Consequently, 
the filter is time-varying. 

Let the number within the angled brackets (■) denote the (discrete) time at which the 
filter is computed. The raw state estimates (for times t = mT u , for m — 0, . . . , n) computed 
at time t = nT u are given by 

[Xn]i(n) = Y%=o[ w nr-l,i]&U T ' + ( n ~ l ) T u) 

[x n -Mn) = 1/2 ■ [x n -Mn - 1) + 1/2 ■ £j=o[ v »-w]i&0' r . + (n - l)T u ) (10) 
[x m }i(n) = [x m ]i(n - 1), Vm < n - 1. 

The state estimates are given by 

[x m }i(n) = min |^(mT n ) - Z;,max {^(mT u ) - Sj, [x m ]i(n)}|, Vm. (11) 



The operation in (11) maintains the bounds on the noise, and it makes sure that the filter 
saturates if it tries to exceed the bounds of the noise. This filtering is well-defined because 
of the piecewise continuity of the control input u(t), and it is consistent in a pointwise sense, 
as the following theorem shows. 

Theorem 2 (Ruppert and Wand, 1994). If T u is fixed, r is the polynomial order of the 
filter, and k — > oo such that kT s = T u ; then, the filter defined in (10)-(11) is consistent: 
\\x m -x m {n)\\=O p {k-^l^). 

Proof. Strictly speaking, the result in [TS] applies to the filter defined in (J9|-(10). Consistency 
with respect to (11) is established by noting that the bounds on the noise imply that \\x r 



x m (n)\\ < \\x m - x m (n)\\. □ 

Remark. Because k = T u /T s , this theorem intuitively says that the filter performs well as 
long as T s is much smaller than T u . 



We also have the following lemma which discusses the finite-sample properties of (11). 
The intuition is that if the measurement noise is bounded and all states are measured, 
then the filter preserves the property that the state estimates remain within a bounded 
distance of the true states. Note that the Minkowski sum [15] of two sets U, V is defined as 
U © V = {u + v : u e U] v G V}. 

Lemma 2. Under the assumptions delineated above, we have that x m G x m © £ , where 
£ = {e:l j < [e],- < Si } © (-{e : < [e], < Sj }). 



Proof. Note that (11) enforces that [£ m ]j — Sj < [x m ]j < [£ m ]j ~ hi which can be rewritten 
as x m G © (— {e : lj < [e]j < Sj}). The bounds on the noise [x m ]j + lj < [£ m ]j < l x m]j + Sj 
are equivalent to having ^ m G x m © {e : lj < [e]j < Sj}. The result follows from properties of 

□ 
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3.2 Filter Implementation 



Because the filter is simply a weighted sum of measurements (10), the largest difficulty with 
implementation is in computing the filter coefficients ([9]). The first step in doing this is to 
choose the order of the filter. Empirical results show that linear (r = 1) or quadratic (r = 2) 
LPR typically gives good results. For clarity of presentation, we focus here on the case of 
r — 1. 

Having chosen the order of the filter, the next step is to compute the bandwidth parame- 
ters hx-,m,i, h p;m> i. To make the notation compact, let ? be a blank spot that is either replaced 
with ? = p or ? = A. Using results from [6J, it can be shown that the optimal bandwidths 
for r = 1 are approximately given by 



aa 2 T u 



2x i (mTj)fc 

2/. A J.. / / / ..2 



1/5 

(12) 




k [v)dv I / v k{v) dv 



and the second time-derivative ij(mTj) is the left-sided derivative if ? = A (or right-sided 
derivative if ? = p). Unsimplified expressions for the cases r > 1 can be found in [5J. We 
can approximate the values of these second time-derivatives by using 0. More specifi- 
cally, the estimated values are given by Xi(mT£) = [A 2 c ^{mT u ) + A c B c u rn \ i and Xi(mT£) = 
[A 2 c £(mT u ) + A c B c u m ^\. 

Because it is time consuming to compute the filter coefficients (j9]), we suggest an imple- 
mentation in which they are precomputed. Define a set % = {h%, . . . , h max }, and compute 
the filter coefficients for each value in H. Then, when we would like to filter, we estimate the 
time derivatives Xi(mT£) and Xi(mT*), and use these to compute h? ;m ^. The closest value 
in H is selected, and the corresponding set of precomputed filter coefficients are used to do 



the filtering as defined in (10)-(11) 



4 Epi- convergence Proofs 

We provide proofs of the theorems regarding convergence of the control law of LBMPC to an 
MPC that knows the unmodeled dynamics, for both the case where the oracle is parametric 
and the case where the oracle is nonparametric. The key for these results is that the system 
trajectory must have a property called sufficient excitation (SE), which intuitively means 
that all modes of the system are perturbed so that they can be identified. The theorem on 
convergence is trivial in the parametric case, because it results from combining two existing 
theorems that are valid under SE. In the nonparametric case, we consider both a generic 
oracle and an oracle that we have designed, which we call the L2-regularized Nadaraya- 
Watson (L2NW) estimator. The proofs for this case are more involved, since they require 
showing epi-convergence of the nonparametric oracles under the notion of SE. 
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4.1 Parametric Oracle 



Proof of Theorem 5 in [3j. The proof simply requires application of existing theorems. If A n 
converges in probability to Ao, then the result is true by Proposition 2.1 of [18]. The required 
convergence in probability occurs under SE [HI HO], and so the result trivially follows. □ 

Remark. The situation in which the states are measured with noise requires the use of 
the continuous mapping theorem [16] taken in conjunction with Theorem [2j For the case 
where the parameters enter linearly, the hypothesis of the continuous mapping theorem is 
satisfied because the linear least squares estimate A n is continuous with respect to the the 
measurements given SE [8j. For the nonlinear case, we need to explicitly assume that there 
is a unique Ao; under this assumption, Ao is a minimizer to the least squares problem with 
no noise, and so it is continuous with respect to measurements by the Berge maximum 
theorem [J]. (The Berge maximum theorem gives upper hemicontinuity of Ao, which results 
in continuity because A is single- valued due to its assumed uniqueness.) This allows for the 
use of the continuous mapping theorem. 

4.2 Nonparametric Oracle 

We first prove convergence of the control law of LBMPC that uses a generic nonparametric 
oracle, under an assumption of SE. This result will then be used to prove a corresponding 
theorem for the case in which the oracle is taken to be the L2NW estimator. In this section, 
we will refer to the functions ip n , <p, ipo that are defined in Theorem 4 of [3]. 

The first theorem we present pertains to convergence in probability of the composition of 
functions that individually converge in probability. We need the following theorem in order 
to show epi-convergence of ip n , for the LBMPC problem that uses a nonparametric oracle 
that stochastically converges in the appropriate sense (which we will define later). 

Theorem 3. Let X v C M, a , X w C MP , and 1Z C W be closed and compact sets, and assume 
that we have a sequence of functions V n (x) : X v — > X w and W n (x) : X w — > 1Z which converge 
in probability to V(x),W(x) as sup^g^ — V(a;)|| = O p (r n ) and sup^g^ ||W n (x) — 

W(x)|| = O p (s n ). IfW is Lipschitz continuous with constant L w , then sup xg ^ || W n (V n (x)) — 
W(V(x))\\ = O p (c n ), where c n = max{r n ,s n }. 

Proof. Applying the triangle inequality gives 




P(sup^ \W(V n (x))-W(V(x))\/c n > e). (13) 



The first term on the right in (13) can be bounded as 



P 



sup^ \W n (V n (x)) - W(V k (x))\/c n > e) < p(su P ^ 



W n (x)-W(x)\/c n >e^, (14) 
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and so the limit of (14) by assumption is lim F(sup xeXv \W n (V n (x)) — W(V n (x))\/c n > e) = 0. 
The second term on the right in (13) is bounded using the Lipschitz constant as 

P(sup^ \W(V n (x)) - W(V(x))\/c n > e) < P(sup^ L w \V n (x) - V(x)\/c n > e), (15) 

and taking its limit gives by assumption that lim P(sup a . 6iYt) |Vy(V^(x)) — W(V(x))\/c n > 
e) = 0. The result follows by taking the limit of ( Jl~3| ) and observing that the limit is equal 
to zero. □ 

Remark. The theorem shows that convergence in probability is preserved under composition, 
but the one subtlety in the result and subsequent proof is the issue of domains of convergence. 
We are composing two functions W n (V n (x)) , and convergence occurs as long as the range of 
the function on the inside V n {-) lies within the domain of convergence of the function on the 
outside W n (-). 

Proof of Theorem 6 in [3j. Note that equality constraint in LBMPC 

Ax n+i + Bu n+i + O 

) (16) 

recursively defines x n+i+ i, for i = {0, ...,N — 1}, as functions of only x n and c n+i . For 
example, the equation for x n +2 is given by 

O n ) = A 2 x n + AB(Kx n + c n ) + AO Kx n + 

+ B(K(Ax n + B(Kx n + c n ) + O n (x n , Kx n + c n )) + c n+1 ) 
+ O n {Ax n + B(Kx n + c n ) + Kx n + c n ),K(Ax n 

+ B{Kx n + c n ) + O n (x n , Kx n + c n )) + c n+x ) . (17) 

Using our assumption along with the continuous mapping theorem, we have that 

sup \\x n+i (x n ,O n ) - x n+i (x n ,g) || = O p (r n ), (18) 

where r n is the convergence rate by assumption. Since ip n is continuous, we can compose it 
with x n+i using Theorem 3 This gives that sup^.^^ ||V> n - -0 O || = O p (r n ). 

The last step requires showing that this condition is equivalent to lower semicontinuity 
in probability. For notational convenience, we will define c = [c' n ... c r n+N _ 1 ]'. Because ip 
is continuous, given e > and a point x ,c,9, there exists a neighborhood U{x ,c, 9} such 
that 

|^ o (C)-^o(zo,c,0)| <e/2, (19) 
for all ( G U{xq, c, 6}. Now consider the expression 

a = p(inf Ce{/{x0iCie} ^ n (0 < Mxo, c, 9) - < F^swp CeU{xo ^ e} \4> n {() ~ ^o^o, c, 6)\ >e). 

(20) 



Using (19), we can further bound the expression above by 

a < P(sup f6C , {ai0iC[ . ]} 14(0-4(01 > e/2). (21) 

Taking the limit, we have that lima = 0, and so the result follows by applying Proposition 
5.1 of HZ]. □ 
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4.3 L2NW Estimator 



Showing that the L2NW estimator leads to convergence of the control law of LBMPC under 
an assumption of SE requires additional work. For ease of reference, we give one expression 
of the L2NW estimator defined in [3]. Let Xi = [x' t w'J', Yi = x i+ i — (Axi + Biii), and 
Sj = || £ — Xi\\ 2 /h 2 , where Xi G IR p+m and Yi G R p are data and £ = [x 1 u']' are free variables. 
We define any function k : R — > K + to be a kernel function if it has (a) finite support 
k(u) = for \u\ > 1, (b) even symmetry k(i>) = k(—u), (c) positive values k(u) > for 
\u\ < 1, (d) differentiability (i.e., the derivative dn exists), and (e) nonincreasing values of 
k(v) over v > 0. The L2NW estimator is defined as 

O n (x, u; X u Yi) = ^^(yj (22 ) 



where A G R+. If A = 0, then (22) is simply the Nadaraya-Watson (NW) estimator. The A 
term acts to regularize the problem and ensures differentiability. 

We begin by proving a uniform version of a theorem that is called either the continuous 
mapping theorem [TB] or Slutsky's theorem [5], depending on the author. 

Lemma 3. Given random variables V n ,V G V, for all n G Z + , such that \\V n — V|| = 
O p (r n ); if L(x,v) : X x V — > R a continuous function and X,V are compact sets, then 
sup xeX \L(x, V n ) - ^(a;, V")| = O p (r n ). 

Proof. The Heine-Cantor theorem (Theorem 4.19 in [12]) gives uniform continuity of L(x, v) 
on X x V, and this implies that for all x, \\V n — V\\ > 5 > whenever |L(x, \4) — L(x, V)\ > 
e > 0. Proceeding analogously to [IB] , we have 



P(sup|L(x,1/ n )-L(x,1/)| >e) = P(3x: |L(x, K) - L(x, V)\ > e) <P(||V n -y|| > 5). (23) 

The result is immediate. □ 

We can now show the first convergence result for the L2NW estimator. Let Xi,Yi be 
defined in the same way as Xi,Yi, with the change that Xi, Yi are defined using state estimates 
x instead of noiseless measurements of the state x. The intuition of why this result is true 
is that though noise in Yj and X^ is correlated, our filtering defined in Section [3] makes this 
correlation asymptotically insignificant. This result can be interpreted in an instrumental 
variables context [T] [H] . 

Corollary 1. // \\Xi - JSQ|| = O p (r k ), where k G N + , then sup xeX ueU \\O n (x, u; X h %) - 
O n {x,u;X i ,Y i )\\ = O p (r k ). 

Proof. Define a random variable Vk = [X' . . . X' n _ 1 Yq ... Y^A , and let V be the 
corresponding limiting vector. The definition of Yi and the corollary's assumption imply that 
\\Yi - Yi\\ = O p (r k ), and so ||V fe - V\\ = O p (n ■ r k ). 

Now consider the functions 77 = Ei^ K ("i)/ ri ) ^ = Ei K (^i)/ n > an d p = i]/(\/n + 5). 
Applying Lemma || gives, sup xeX , aeU \\ri{& V k )-r}(£; V)\\ = O p (r k ) and sup xeA>eW ||<J(^; V k )- 
5{i; V)\\ = O p (r k ). Another application of Lemma |3| gives sup^^.^ \\p(& V k ) - p(£; V)|| = 
O p (r k ). The result follows by noting that O n (x, u; X i: Yj) = p(£;V k ) and O n (x, u; X i} Y^ = 

p(Z;V). □ 
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Remark. The variance of the NW estimator in its typical setup is known to uniformly con- 
verge at a rate no faster than ri~ 4//( - p+4 - ) [13]. Our result gives a nonstandard rate of conver- 
gence Tk, because we have a time-series setup with presmoothing to account for the errors 
in measurements. 

Convergence of an estimator is often studied by decomposing the estimation error into 
a bias and variance term. For proving convergence of our L2NW estimator, we have to be 
careful in defining the probabilistic framework before we can decompose the error into two 
terms. The Xi values are not independent variables drawn from some probability distribu- 
tion. They are exactly the states of a deterministic system, as it evolves in time. In fact, 
if the control inputs u n are (deterministically or statistically) known for each point in time, 
then Xi and Xj are dependent for all values where i > j. 

For a nonlinear system, SE is usually defined using ergodicity or mixing, but this is 
hard to verify in general. Instead, we define SE as a finite sample cover (FSC) of X. Let 
Bh(x) = {y '■ \\x — y\\ < h} be a ball centered at x with radius h, then a FSC of X is a set 
Sh = \JiBh/2(Xi) that satisfies X C Sh- The intuition is that {Xi} sample X with average, 
inter-sample distance less than h/2. Assuming SE in the form of a FSC with asymptotically 
decreasing radius h, we can show that the control law of LBMPC that uses L2NW converges 
to that of an MPC that knows the true dynamics. 

Recall that g(x,u) is the modeling error of the approximate linear system defined in 
[3]. We have the following result, which shows that the L2NW estimator with noiseless 
measurements and SE can approximate the unmodeled dynamics arbitrarily well. 

Lemma 4. If g(x, u) is Lipschitz with constant L and Sh is a finite sample cover of Z C X x 
U, then sup, XtU \ eZ \\g(x,u) — O n (x,u;Xi,Yi)\\ < \iM a + (1 — ji)Lh, where /i = A/(A + k(1/2)) 
and M g = max : x £ X . 

Proof. Define I = {i : H, < 1}, and note that ft(Sj) = for j ^ /. An alternative 
characteristic of the L2NW estimator is as the positively weighted average: O n (x, u; X i: Y{) = 
w ■ + J2iei w i ' Y h where w ,Wi > 0, w { = «;(5 i )/(A + J2j and + J2i w i = l - 

The finite sample cover property of Sh implies that J2j — ^(1/2). Noting w < 

A/(A + k(1/2)) and Yi = g{Xi), the result follows from the triangle inequality. □ 

Remark. The result shows that the L2NW estimator in our setup has bias 0(A + h), where 
A = 0(h). This matches the bias of the NW estimator 0(h) in a standard setup at both 
interior and boundary points [TT| [13] . 

Theorem 4. Assuming Sh n is a finite sample cover of Z C X x U, for some sequence 
h n — > 0; A = 0(h n ); and k is a sequence such that T u /k — > (see Theorem^; then the 
L2NW estimator is uniformly consistent on Z and converges at rate 

sup \\g(x,u)-O n (x,u;X i ,Y i )\\ = 0(\ + h n ) + O p (k~^'^). (24) 



Proof. Using the triangle inequality, the left-hand side of ( 24 ) is bounded by 



sup \\O n (x,u;Xi,Yi) ~ O n (x,u;Xi,Yi)\\ + sup \\g(x, u) - O n (x, u; X i: Yi)\\ (25) 

(x,u)az {x,u)az 

This first term is controlled by Corollary [T] and Theorem [2j and the second is governed by 
Lemma HI □ 
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Remark. Ideally, we would like Z = X xU, but this does not always happen. It requires that 
the trajectory of the system sufficiently explores the space in a manner formalized by the 
definition of finite sample cover. A set Z which meets the assumptions of Theorem [4] always 
exists, and this can be shown by construction: Given any n > 0, let Z = U" = T 1 Xj. A better 
set Z is defined as the limit of a convergent subsequence of X±, and its advantage is that 
the Xi visit a neighborhood of the limit infinitely often. Such a limit is guaranteed to exist 
by the Bolzano- Weierstrass theorem. These two constructions mean that there is always 
some set on which the nonlinear dynamics g(x,u) can be learned, and this set corresponds 
to points which the trajectory visits. 

With the previous theorem, we can now show the control law of LBMPC with L2NW as 
oracle converges to that of MPC that knows the unmodeled dynamics, when there is SE as 
defined by the appropriate FSC. 

Proof of Theorem 7 in [3]. The result directly follows from Theorem 6 in [3], since Theorem 
[4] shows that the L2NW estimator satisfies the appropriate convergence conditions. □ 

Remark. Note that in the case of no measurement noise, the asymptotic term Op(k~^ r+1 ' ) ^ 2r+3 ^) 
drops out of the corresponding expressions above. 
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